| Category | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
| Shape | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
| Texture | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Color | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Size | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
An Introduction to HBMs and their Application to Category Learning”
Repeatedly draw from bags of black and white marbles with unknown proportion of black marbles:
\(\rightarrow\) High chance of next marbles also being black!
Goal
We want to build a Bayesian model that reverse-engineers the mind‘s reasoning about color distributions across bags.
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
Reparamerization as
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
Level 4 – Hyperparameters
\(\frac{\alpha}{\alpha + \beta} \sim \text{Unif}(0, 1)\)
\(\alpha + \beta \sim \text{Exp}(1)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
Level 4 – Hyperparameters
\(\frac{\alpha}{\alpha + \beta} \sim \text{Unif}(0, 1)\)
\(\alpha + \beta \sim \text{Exp}(1)\)
We have \(i\) bags of marbles, where \(y_i\) is the number of black marbles observed and \(n_i\) is the total marbles drawn.
Level 1 – Data
\(d_i: \big\{y_i, n_i \big\}\)
Level 2 – Bag-specific distribution
\(y_i ~ \big| ~ n_i \sim \text{Binom}(\theta_i)\)
Level 3 – General knowledge about bags
\(\theta_i \sim \text{Beta}(\alpha, \beta)\)
Level 4 – Hyperparameters
\(\frac{\alpha}{\alpha + \beta} \sim \text{Unif}(0, 1)\)
\(\alpha + \beta \sim \text{Exp}(1)\)
Applying Bayes Formula to HBM
\[ \begin{gathered} \overbrace{P(\theta, \alpha, \beta ~ | ~ y)}^{\text{Posterior}} \propto \underbrace{P(\alpha, \beta)}_{\text{Hyperprior}} \overbrace{P(\theta ~ | ~ \alpha, \beta)}^{\text{Conditional Prior}} \underbrace{P(y ~ | ~ \theta, \alpha, \beta)}_{\text{Likelihood}} \end{gathered} \]
Posterior inference regarding \(\theta_i\) by integrating out \(\alpha\) and \(\beta\)
\[ \begin{align*} P(\theta_i ~ | ~ d_1, \dots, d_n) = \iint P(\theta_i ~ | ~ \alpha, \beta, d_i) P(\alpha, \beta ~ | ~ d_1, \dots, d_n) \,d\alpha \,d \beta \end{align*} \]
Numerical integration of equation approximated using Markov Chain Monte Carlo (MCMC) methods, e.g., Hamiltonian-Monte-Carlo (HMC) using STAN:
// Beta-Binomial Hierarchical Model in STAN
data {
int<lower=0> N; // Number of bags
array[N] int<lower=0> n; // Number of marbles drawn from each bag
array[N] int<lower=0> y; // Number of black marbles in each bag
}
parameters {
real<lower=0,upper=1> mu; // Hyperparameter: mean of the Beta distribution
real<lower=0> phi; // Hyperparameter: precision of the Beta distribution
array[N] real<lower=0, upper=1> theta; // Bag-specific proportion of black marbles
}
transformed parameters {
// Reparameterization of the Beta distribution
real<lower=0> alpha = mu * phi;
real<lower=0> beta = (1 - mu) * phi;
}
model {
mu ~ uniform(0, 1); // Hyperprior for µ
phi ~ exponential(1); // Hyperprior for ϕ
theta ~ beta(alpha, beta); // Conditional prior for θ
y ~ binomial(n, theta); // Likelihood
}How did the model solve the marble problem?
A mother points to an unfamiliar object lying on the counter and tells her child that this is a pen.
Question
By which features do children generalize of a pen and recognize future instances?
Shape Bias
The expectation that members of a category tend to be similar in shape, which is learned by the age of 24 months (Smith et al., 2002).
| Marble World | Cognitive World |
|---|---|
| Bag | Category (e.g., “Dax”) |
| Marble | Object Exemplar |
| Color (Black/White) | Feature Value (Round/Square, Red/Blue) |
The Structural Shift
Real objects aren’t just “Black or White.” They are multi-dimensional. We must expand the model from Binary (Beta-Binomial) to Multinomial (Dirichlet-Multinomial).
Glassen & Nitsch (2016) Griffiths et al. (2024) Kemp et al. (2007)
1
|
2
|
3
|
4
|
|||||
|---|---|---|---|---|---|---|---|---|
| Category | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
| Shape | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
| Texture | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Color | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| Size | 1 | 2 | 1 | 2 | 1 | 2 | 1 | 2 |
'Dax'
|
Object 1
|
Object 2
|
Object 3
|
|
|---|---|---|---|---|
| Category | 5 | ? | ? | ? |
| Shape | 5 | 5 | 6 | 6 |
| Texture | 9 | 10 | 9 | 10 |
| Color | 9 | 10 | 10 | 9 |
| Size | 1 | 1 | 1 | 1 |
After training, children (and the model) encounter a new object with a novel noun “dax”.
Task: Which of the three candidates with unkown label categories is most likely to be a dax?
Data based on Smith et al. (2002)
Test